Minimum mean square error spectral peak envelope estimation for automatic vowel classification
نویسندگان
چکیده
Spectral feature computations continue to be a very difficult problem for accurate machine recognition of vowels especially in the presence of noise or for otherwise degraded acoustic signals. In this work, a new peak envelope method for vowel classification is developed, based on a missing frequency components model of speech recognition. According to this model, vowel recognition depends only on the location of spectral peaks. Also, smoothing and interpolation of the sampled spectra, performed in the cepstral analysis method commonly used in automatic speech recognition results in a loss of valuable information. The new method for feature extraction presented in this paper is based on minimum mean square error curve fitting of cosine-like basis vectors to all peaks in the speech spectrum. A mathematical model for smoothly tracking spectral envelopes using only spectral peak information and ignoring other parts of the spectrum is presented. A software algorithm for the model was developed and tested for various speaker types using a neural network classifier. Vowel classification experiments were conducted based on the features derived from the spectral peaks. The classification rates of the peak method under various signal to noise ratios was also evaluated. The basic conclusion is that the new features perform the same as cepstral features for clean speech, but have advantages when the signal is degraded by noise.
منابع مشابه
Vocal tract normalization based on spectral warping
Two techniques for speaker adaptation based on frequency scale modifications are described and evaluated. In one method, minimum mean square error matching is performed between a spectral template for each speaker to a "typical speaker" spectral template. One parameter, a warping factor, is used to control the spectral matching. In the second method, a neural network classifier is used to adjus...
متن کاملSpeaker Normalization for Improved Automatic Speech Recognition for Digital Libraries
SPEAKER NORMALIZATION FOR IMPROVED AUTOMATIC SPEECH RECOGNITION FOR DIGITAL LIBRARIES Wei Wang Old Dominion University, 2004 Director: Dr. Stephen A. Zahorian The context of the thesis work is the improvement of automatic speech recognition (ASR) for use with digital libraries. First, commonly used multimedia file formats and codecs are surveyed with the objective of identifying those formats t...
متن کاملSignal adaptive spectral envelope estimation for robust speech recognition
This paper describes a novel spectral envelope estimation technique which adapts to the characteristics of the observed signal. This is possible via the introduction of a second bilinear transformation into warped minimum variance distortionless response (MVDR) spectral envelope estimation. As opposed to the first bilinear transformation, however, which is applied in the time domain, the second...
متن کاملDiscrete weighted mean square all-pole modeling
The paper presents a new method for all-pole model estimation based on minimization of the weighted mean square error in the sampled spectral domain. Due to discrete nature of the proposed distance measure, emphasis can be put on an arbitrary set of spectral samples what can greatly improve the model accuracy for periodic signals. Weighting can also be applied to improve the fitting in certain ...
متن کاملRegional model presentation for peak discharge estimation in ungauged drainage basin using geomorphologic, Synyder, SCS and triangular models (case study: Kan drainage basin)
With regard to the importance of instantaneous peak discharge estimation for watershed management study, and due to the lack of and unqualified climatic and hydrologic data for estimation and measurement in countries such as Iran, researchers were obliged to establish a link between constant parameters (geomorphologic) and variables (hydrologic) to present models with minimum dependence on clim...
متن کامل